Regret Bounds for Batched Bandits

نویسندگان

چکیده

We present simple algorithms for batched stochastic multi-armed bandit and linear problems. prove bounds their expected regrets that improve extend the best known regret of Gao, Han, Ren, Zhou (NeurIPS 2019), any number batches. In particular, our in both settings achieve optimal by using only a logarithmic also study adversarial problem first time provide regret, up to factors, algorithm with predetermined batch sizes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Regret Bounds for Deterministic Gaussian Process Bandits

This paper analyzes the problem of Gaussian process (GP) bandits with deterministic observations. The analysis uses a branch and bound algorithm that is related to the UCB algorithm of (Srinivas et al., 2010). For GPs with Gaussian observation noise, with variance strictly greater than zero, (Srinivas et al., 2010) proved that the regret vanishes at the approximate rate of O ( 1 √ t ) , where t...

متن کامل

Logarithmic regret bounds for Bandits with Knapsacks

Achievable regret bounds for Multi-Armed Bandit problems are now well-documented.They can be classified into two categories based on the dependence on the time horizonT : (1) small, distribution-dependent, bounds of order of magnitude ln(T ) and (2) ro-bust, distribution-free, bounds of order of magnitude√T . The Bandits with Knapsackstheory, an extension to the fram...

متن کامل

Instance-dependent Regret Bounds for Dueling Bandits

We study the multi-armed dueling bandit problem in which feedback is provided in the form of relative comparisons between pairs of actions, with the goal of eventually learning to select actions that are close to the best. Following Dudı́k et al. (2015), we aim for algorithms whose performance approaches that of the optimal randomized choice of actions, the von Neumann winner, expressly avoiding...

متن کامل

Regret Bounds for Restless Markov Bandits

We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner’s actions. We suggest an algorithm that after T steps achieves Õ( √ T ) regret with respect to the best policy that knows the distributions of all arms. No assumptions on the Markov chains are made except that they are irreducible. In addition, we sho...

متن کامل

Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits

A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to constraints, and then observes stochastic weights of these items and receives their sum as a payoff. In this paper, we close the problem of computationally and sample efficient learning in stochastic combinatorial semi-bandits. In particular, we an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i8.16901